Assigning Inflectional Paradigms to Named Entities by Linear Successive Abstraction
نویسندگان
چکیده
This paper describes how a supervised learning method is used for assigning inflectional paradigms to organizational named entities as the main prerequisite for generating a morphological lexicon of these entities. An inflectional paradigm consists of a set of rules for generating all forms of a lexicon entry. A morphological lexicon consists of lexicon entries and their corresponding forms. This type of language resource is crucial in tasks such as natural language generation (generating natural language business news from database data and news templates) and named entity identification (necessary step in data mining and business intelligence). The basic resource used in this research is a list of 106.530 named entities of organizations given in basic form (nominative) and ranked by relevance. On the first 5.000 manually tagged named entities 59 inflectional paradigm classes are defined. Using linear successive abstraction, a suffix model is trained, validated and tested on this tagged dataset. Morphological lexica of general language, personal names and settlements are used as additional resources in the decision process. The achieved accuracy on the test set is 98,70%.
منابع مشابه
Generating a Morphological Lexicon of Organization Entity Names
This paper describes methods used for generating a morphological lexicon of organization entity names in Croatian. This resource is intended for two primary tasks: template-based natural language generation and named entity identification. The main problems concerning the lexicon generation are high level of inflection in Croatian and low linguistic quality of the primary resource containing na...
متن کامل1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation
Natural language decisions often involve assigning values to sets of variables, representing low level decisions and context dependent disambiguation. In most cases there are complex relationships among these variables representing dependencies that range from simple statistical correlations to those that are constrained by deeper structural, relational and semantic properties of the text. In t...
متن کاملInflectional paradigms have bases too : Arguments from Yiddish ∗
It is well known that the phonological form of a word can depend on itsmorphological structure. In serial approaches, this follows naturally from the fact that words have derivational histories: morphologically complexwords undergo successive levels of phonology as they are constructed, making them eligible for different phonological processes along the way. A crucial distinction is typically m...
متن کاملUsing Wikipedia as a Resource for Arabic Named Entity Recognition
In this paper we describe a novel approach to delimit named entities and nominal successive mentions in the Arabic Wikipedia. To achieve this a linguistic analysis of named entities and successive mentions in the Arabic Wikipedia, in terms of coverage and complexity, is presented. A supervised machine learning classifier has been used to predict the presence of the named entities in the Arabic ...
متن کاملSub-optimal Paradigms in Yiddish
It is well known that the phonological form of a word can depend on its morphological structure. In serial approaches, this follows naturally from the fact that words have derivational histories: morphologically complex words undergo successive levels of phonology as they are constructed. A crucial distinction is typically made, however, between derivational and inflectional morphology. Whereas...
متن کامل